Model Selection

Long - video event capture

# Long - video event capture

Fastvlm 0.5B Stage3

FastVLM-0.5B-Stage3 is an efficient multimodal language model with visual understanding and language processing capabilities. It can process long videos and generate structured outputs.

Transformers English

Fastvlm 0.5B Stage2

FastVLM-0.5B-Stage2 is an efficient multimodal language model capable of understanding visual content and handling text tasks.

Multimodal Fusion

Transformers English

Qwen2.5 VL 32B Instruct Exl2 4 25bpw

Qwen2.5-VL-32B-Instruct is the latest vision - language model in the Qwen family, with powerful multimodal understanding and generation capabilities, supporting the interaction of images, videos, and text.

Transformers English

christopherthompson81

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase